Compressive genomics for protein databases
نویسندگان
چکیده
MOTIVATION The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed-up homology search directly but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools. RESULTS We introduce a suite of homology search tools, powered by compressively accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate with all known state-of-the-art tools, including HHblits, DELTA-BLAST and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP's runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants, which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed-up many tasks, such as protein structure prediction and orthology mapping, which rely heavily on homology search. AVAILABILITY CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/ CONTACT [email protected].
منابع مشابه
Website Review: Protein-Protein Interactions on the Web
We present a brief guide to resources on the Internet relating to Protein-Protein Interactions. These include databases containing experimentally verified and computationally inferred physical and functional interactions. There are also tools for predicting interactions and for extracting information on interactions from the literature, and organism specific databases.
متن کاملA Study of Functional Genomics for Unknown Proteins in Chlamydomonas reinhardtii
Chlamydomonas reinhardtii is a unicellular green alga, which has been used as a reference organism for identifying proteins. Five hundred hypothetical proteins in Chlamydomonas reinhardtii have been sequenced for knowing functions of the proteins in their families. Functions of Five hundred hypothetical proteins in Chlamydomonas reinhardtii were predicted using bioinformatics web tools. The web...
متن کاملFunctional inferences from reconstructed evolutionary biology involving rectified databases--an evolutionarily grounded approach to functional genomics.
If bioinformatics tools are constructed to reproduce the natural, evolutionary history of the biosphere, they offer powerful approaches to some of the most difficult tasks in genomics, including the organization and retrieval of sequence data, the updating of massive genomic databases, the detection of database error, the assignment of introns, the prediction of protein conformation from protei...
متن کاملPrediction of protein-protein interaction based on structure.
A great challenge in the proteomics and structural genomics era is to predict protein structure and function from sequence, including the identification of biological partners. The development of a procedure to construct position-specific scoring matrices for the prediction and identification of sequences with putative significant affinity faces this challenge. The local and web applications us...
متن کاملComparative Functional Genomics Studies for Understanding the Hypothetical Proteins in Mycobacterium tuberculosis KZN 1435
The prediction for the unknown proteins from Mycobacterium tuberculosis KZN 1435 were carried out for characterization of the proteins in their respective families. In Mycobacterium tuberculosis KZN 1435 out of 1560 genes for hypothetical proteins, functions were predicted for 1221 hypothetical protein whereas, structures for 803 unknown proteins were revealed. The Bioinformatics web tools like...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 29 شماره
صفحات -
تاریخ انتشار 2013